Planning with Large Language Models for Code Generation
We were joined by the authors Shun Zhang and Zhenfang Chen today.
We first highlighted some questions related to the paper
- Beam search vs. MCTS: see results in the graph below. Intuitively, beam search is much faster, but MCTS puts more emphasis on long-term results (by doing full rollouts)
- Maybe you can combine sampling with beam search? For example, sample first, and then run beam search.
- The MCTS sampling approach is a bit difficult to parallelize.
- With longer computation time, sampling + filtering is able to reach higher pass rates (it’s about linear)
- Computation time plot:

Ideas around test cases:
- Test cases are generated from the natural language description alone
- Would it be helpful to put more test cases in the prompt?
- Is it easier to generate test cases than to generate code?
- Maybe we could train another network to adversarially generate test cases?
- How about using more assert statements? Authors noted that model doesn’t write code, it just makes more test cases.
- Relevant work: CodeT (https://arxiv.org/abs/2207.10397), AlphaCode
Ideas around human feedback:
- The premise: given two options (pi * radius * radius vs. 3.14 * radius * radius), which one would a human prefer?
- There may be a lot of feedback needed.
- How do we best use the human as feedback?
- Maybe people have their own preferences (coding style?)
- Passing test cases may not be the human desiderata
- You can also use human feedback on top of the existing model
Ideas around reward function design:
- Authors experimented with minimizing length of code and maximizing comments in code.
- Does increased pass rate really mean we’re doing program synthesis? Is it really better to pass more test cases even if we didn’t solve the problem fully?
- Would using pass@k help instead of using pass rate? It’s a sparser signal, so they chose pass rate instead
- Could we integrate the stack trace in the reward? Would it help?
Could we use this framework without prefixes?
- Infilling?